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Abstract 

I discuss optimized data analysis and Monte Carlo methods. Reweight- 
ing methods are discussed through examples, like Lee- Yang zeroes in the 
Ising model and the absence of deconfinement in QCD. I discuss reweighted 
data analysis and multi-hystogramming. I introduce Simulated Temper- 
ing, and as an example its application to the Random Field Ising Model. 
I illustrate Parallel Tempering, and discuss some technical crucial details 
like thermalization and volume scaling. I give a general perspective by 
discussing Umbrella Methods and the Multicanonical approach. 



Lectures given at the 1996 Budapest Summer School on Monte Carlo Methods. 



cond-mat/9612010 



1 



1 Introduction 



In the following I will give an introduction to optimized Monte Carlo methods and 
data analysis approaches. We will see that the two issues are very interconnected, 
and we will try to understand why. I will try to keep the same style one has while 
lecturing, trying really to explain all important points in some detail. Even the 
figures will be mostly copied from my transparencies to this text: I hope that at 
least partially that will fulfill the goal I have in mind. 



Reweighting methods are based on simple, basic idea: when you run a numerical 
simulation at a given value of the inverse temperature f3 and you measure some 
set of observable quantities (including the internal energy of the system) you 
are learning far more than simply the value of 



and your ignorance about it (the statistical error). Expectation values of the 
observable quantities at j3 turn out to be only a small part of the information 
you are gathering (if you store the right numbers!). 

The partition function of our statistical system at f3 can be expressed as the 
integral over the energetic levels allowed to the system 



where N(E) is the energy density. An accurate numerical simulation at (3 gives 
you information about N(E), and this information can be used in many ways 
we will discuss in the following. By accurate numerical simulation we mean here 
that in order to make the information about N(E) meaningful we need a large 
sample, and that the problem of controlling the statistical and systematic errors 
is here non-trivial. This is also the path to the definition of improved Monte 
Carlo methods like tempering, that we will discuss in the following. 

In the following we will be discussing reweighting techniques (also as an intro- 
duction, as we said, to improved Monte Carlo method, to make clear the logical 
path that leads us there). After the work of [10], [29] we will discuss first about 
the simple Ising model and the Lee- Yang theorem in (2.1) and then about the 
existence of a phase transition in a 4 dimensional SU (2) Lattice Gauge Theory in 
(2.2). By doing that we will give a very schematic definition of a Lattice Gauge 
Theory. We will next discuss the use of this approach for improving the qual- 
ity of the analysis of numerical data. We will introduce hystogramming in (2.3) 
(this is a classical development, based on classical work in molecular dynamical 
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simulations by [41], [8], [33], and on the more recent work contained in [10], [29] 
and [12]), and the work on multi-hystogramming of [13] in (2.4). 

So, in order to summarize again the physics side of the next section, we 
will start by discussing Lee- Yang zeroes in the 3d Ising model, by clarifying 
a few crucial issues about phase transitions. We will discuss how to compute 
critical exponents from there. Then we will discuss about the SU(2) Lattice 
Gauge Theory zero analysis, and we will see how that helps in establishing that 
confinement of quarks in colorless particles survives in the continuum limit. At 
last we will give details about data analysis. We will also try to clarify the path 
that will eventually lead this technique to be promoted, from a data analysis tool, 
to a (sometimes very effective) simulational method. 

2.1 Lee- Yang Zeroes 

Someone interested in the numerical study of critical phenomena should always 
consider as fundamental the fact that phase transitions only exist in the infinite 
volume limit. On the finite volume (i.e. inside all of our computers) there are no 
phase transitions. Let us start by clarifying this point a bit. 

We are working in the complex f3 (or T) plane. We consider a compact 
configuration space (spin variables cannot diverge: the Ising case with ±1 values 
is a very good example, and also a compact SU (N) gauge theory with SU (N) 
matrices is): what we will discuss can be also proved under far more general 
assumptions, but that makes things more evident. The absolute value of the 
partition function Zp is limited from above by 



where the integral is over the configuration space dC, V c = J{dC} is the volume 
of the configuration space and by \H\ we denote the maximum value H({C}) can 
assume when considering all possible configurations: 



This is also true for complex (3 values (in the case where the spin variables can 
take only discrete values Z is a linear combination of exponentials). 

What are the properties of the partition function Zpl A reasonable Zp, which 
is supposed to describe a physical system, is an analytic function in the Re(/3) > 
half of the complex plane. And what happens to the free energy, — log Zpl If 
Z is an analytic function log Zp can be singular only where Zp = 0. That makes 
clear the role of the zeroes of the partition function. For T (and the field h) 
belonging to R + Zp is a sum of positive contributions, and cannot have zeroes in 
the finite volume V. But, as fig. (1) should visualize, in the V — > oo limit zeroes 
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Figure 1: The complex T plane. Zeroes at finite volume (empty dots) pinch the 
real axis at T c (filled dot) in the infinite volume limit. 

that are located for finite V at complex values of T e C can approach the real 
T axis, generating a non-analyticity of the free energy. 

The same kind of reasoning can be applied to the behavior in the magnetic 
field h: here for the Ising model the Lee- Yang theorem holds: The zeroes of the 
partition function are located on the imaginary h-axis, or on the unit circle of the 
complex activity plane. In the finite volume there is a finite number of zeroes, 
and in the infinite volume limit the zeroes condense on part of the unit circle. 

There are no theorems about constraining the zeroes in the complex /3-plane. 
The main theoretical results on this issue have been obtained in [22]: 

• The complex zeroes close to the (to be) critical point accumulate on curves; 

• In 2d they cross the real axis at T c at right angle. 

We will assume that the same situation holds (with a generic crossing angle 
7i — 0) in d > 2. We expect two lines of zeroes in the upper and lower positive 
T complex plane (symmetric because of the reality properties of the partition 
function Z), that pinch the real T axis at T c . The singular part of the free energy 
above and belove the transition has the form 



where a is the usual critical exponent and A^ +S) and ) are the specific heat 
critical amplitudes. Matching the lines of zeroes gives the condition 
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In 2d (f) = ir/2 gives that = A {+ \ as it is. 

So, we have scaling laws for the position of complex zeroes of the partition 
function. Finite size dependence can be derived in the usual way, and we will be 
able to try a numerical experiment to determine the critical behavior. 

The numerical simulation will be based, as we told, on the fact that from a 
Monte Carlo simulation at a fixed (3 value we can gather information about other 
values of (3 (even complex values). Running directly simulations at complex (3 
values is far from straightforward, and we will find this approach quite direct: 
the same approach will lead us to introduce very powerful Monte Carlo methods. 
Here we will use the method to compute zeroes of at f3 = (Re(f3 ), Im(/3 )), 
with small Im(/3o) (this is the region that is more interesting from the scaling 
point of view and, luckily enough, is also the one that we can access by numerical 
methods). When we will apply the method to real, small (3 increments we will 
get an useful tool for data analysis. 

We can start being specific (following [29]), and consider the 3d Ising model, 
with spin variables <7j = ±1, i a 3d label of a lattice site, and an action S 

s = -£(^-i), (7) 

where the primed sum runs on first neighboring spin couples on a simple cubic 
lattice. The partition function Zp is written as 

Zfs = J2 e-" Si{C}) ■ (8) 

{C} 

At the time of this work an exact solution had been obtained for lattices of 
size up to 4 3 , [37]: by exact enumeration one is counting in this case O(10 19 ) 
configurations! Already a lattice of 5 3 sites cannot be exactly enumerated with 
today computers. As we said our statistical technique will be based on the fact 
that we can express the partition function as a sum over the energy levels of the 
system: 



Y,N(E)e-? E , (9) 



where in our case E = 0, . . ., d 3 . The instructions are: run your Monte Carlo 
simulation at f3, and record the normalized energy distribution function F E {(3) 
(this is the number of configurations you find at each energy value, normalized 
to one). One has that 
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E 

Now if we compare two different (3 values (we have run at (3 and we are interested 
in results at f3) we see from (10) that 

N E = F E {P)Z{P)e? E = F E 0)Z0y E . (11) 

We can notice already now that if we stop here, assume that we are dealing 
with two real (3 values, and we use our best numerical estimate for the partition 
function, we get the reweighting formula 

Fm = FM ^ E ea _ mE -. (12) 

as we will see better in the following, and we are only anticipating here, from the 
simulation at (3 we can get expectation values at (3 (if (3 is close enough to (3, and 
the statistical accuracy is good enough not to make the exponential suppression 
kill the signal). But at the moment let us go back to the complex zeroes, and 
rewrite (11) as 

F E {(3) = F E ((3) e ( ^H • (13) 



Summing over energies and using the normalization condition of (10) we find 

m -Y,W)e$-® E . (14) 



So, we are running a Monte Carlo simulation at (3, and we are computing by 
a numerical estimate F E {(5). We want to obtain information about the analytic 
structure of Zp at (3 = rj+i^. The exponential factor in (14) will give, for complex 
(3, two contributions: the first will be an oscillating factor, that is non-zero for 
Im(/3) ^ 0, 

cos(E£)+iam(EZ) (15) 

(and we remind that E is here a number of order volume, not of order one). The 
other contribution is the exponential damping 

e -(v-fiE . (16) 
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since E = 0(V) the damping is severe. Since we are looking for zeroes and we 
know we cannot get zeroes on the real axis, it is a good idea to compute 



Zf3 _ Z/3 f ZR c p \ 

Z Re(3 Z p\ Z p J 
Ee F E 0)e~^ E (cos(gO + i sin(£fl) 

An easy way for a numerical search of zeroes of (17) is to look for minima of 
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F E (/3)e-^ E cos(EO) + [Ee F E (P) e -^ E sin(££ 
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The numerical simulations of [29] were using volumes from N 3 = 4 3 (in order to 
check the exact solution) to 8 3 . (3 was chosen as close as possible to the actual 
location of the zero in order to minimize the exponential damping. 

We have already said that cos(-E^) = cos(el / ^), where e is the energy density, 
of order 1, is highly oscillating, and makes impossible to compute the location of 
zeroes with large imaginary part. One nice fact is that by finite size scaling we 
expect that the distance 5n of the first zero on a lattice of linear size N scales as 

5 N ~ N~» : (19) 

Equation (19) can be used to estimate v from the rate at which zeroes approach 
the real axis. It also tells us that the oscillations due to the cosine term do not 
increase like N d , but better like L d ~~: an exponent of 1.4 instead of 3 for the 3d 
Ising model. This helps. 

In fig. (2) we show the scaling of the distance of the first zero from the real 
axis (one can do the same for farther zeroes, but the precision is smaller). We 
use here the variable u = e~ Al3 . We denote by u l N the position of i-th zero on a 
lattice of linear size N, and plot u% versus iV in double log scale (the figure here 
is not precise, and it is only meant as a graphical reconstruction of the data: the 
reader interested in raw numbers should consult directly [29]. 

The remarkable linearity of the plot denounces a good scaling behavior already 
for small lattices. From these data (numerical archeology at today, but we are 
discussing about the method, not about the numbers!) one gets the reasonable 
estimate v ~ .620 ± .010 (the best estimate for v at the time of this simulation 
was the far superior v ~ .631 ± .001 by Le Guillou and Zinn- Justin). But with 
this method, for example, we can get a very straightforward measurement of the 
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Figure 2: — log lmu% versus log L. 



ratio of the critical amplitudes (a quantity that is not easy to obtain in the series 
expansion approach). We measure, as we have discussed before, the angle at 
which the complex zeroes depart from the real u axis in the infinite volume limit. 
One has that 

tan 2-oU = V ; ^ . 20 

sin(7ra) 

So, for example, in [29] we got an angle of 55.3 ±1.5 degrees, and an amplitude 
ratio of 0.45 ± 0.07. This is an accurate and reasonable result. The method 
works 1 ! 



2.2 Complex Zeroes in a Non-Abelian Four Dimensional 
Lattice Gauge Theory 

In the former section we have described the method one can use to locate complex 
zeroes of the partition function. We will discuss now the physical problem for 
which this technique was first introduced by ([10]): I am basically taking this 
chance to give a ten lines crash course in Lattice Gauge Theories, LGT (that here 
will mainly be a fancy Statistical Mechanics, constructed by exploiting a powerful 
symmetry). We will be discussing about locating complex zeroes in a non-abelian 
Ad LGT, i.e. about one of the ways of getting numerical evidence to show that 
there is no deconfining phase transition in the infinite volume limit. While writing 

1 Obviously there are more recent simulations that follow these lines and are more precise: 
see for example [5], and references therein. 
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Figure 3: From gauge variables to Wilson loops. See text. 

these notes I do not know what will Peter Hasenfratz decide to include in his 
contribution to this volume ([17]): in any case it will be an interesting reading 
for the physicist interested to LGT. If in need of more material, you could look 
at one of the books that are available, [40, 35], to the classical lecture notes by 
[25], to the Les Houches notes by [36]. 

In a Lattice Gauge Theory variables leave on links (as opposite to sites for a 
normal Statistical Mechanics) of a d-dimensional lattice (simple cubic, for simplic- 
ity, in the following), and we will label them by U^(n), where n is a rf-dimensional 
site label, and \i denotes one of the lattice directions (we show the link variable 
in fig. (3.A)). Different gauge theories are characterized by different kind of vari- 
ables. In the relevant case of Quantum Chromo Dynamics, the theory of strong 
interaction of elementary particles, they are SU(3) matrices (here we will consider 
a simpler theory with many similar features, the one of SU{2) 2 ■ 2 matrices). In 
the case of a SU(M) gauge group U is a M • M matrix with U • W — 1, and 
det(U) = 1. 

The Boltzmann equilibrium probability distribution can be written as 

P B ~eP^p Up , (21) 

where the sum runs over all plaquettes (the smallest closed circuits, see (3.B)) of 
the lattice, and 

Up = U,(n)U u (n + fi)Ul(n + u)Ut{n) , (22) 

i.e. one takes on each elementary plaquette the product of ordered link matrices 
(by defining U^n) = Ul^n + ft)). 
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This theory has a dramatically large invariance, known as gauge invariance: 
if we pick up arbitrarily a group element g(n), that we can choose independently 
on each site, and we transform all the link variables under 

U^n) ^ g{n)U ll {n)g\n + p) , (23) 

the action of the theory does not change (and do not change all observable quan- 
tities, i.e. products of links on closed loops, see (3.C)). In the Ising model the 
crucial Z 2 symmetry is only a global symmetry: the theory is symmetric under 
inversion of all o~(n) variables. Here, on the contrary, we have the right to select 
an independent frame rotation at each lattice point, and the theory does not 
change. Such gauge symmetry is exact in the lattice theory. This is the crucial 
step of the Wilson approach to LGT: in the same way in which the 2c? Ising 
model Onsager solution is described by Majorana fermions at the critical point, 
where the correlation length becomes infinite and details of the lattice structure 
are forgotten, the critical limit of lattice QCD is the usual continuum QCD, the 
theory of interacting quarks and gluons. The fact that continuum gauge invariant 
is exactly preserved on the lattice (as opposed, for example, to Lorentz invari- 
ance, that is obviously broken by lattice discretization) is a crucial point of the 
approach. 

We have said already, and we will not enter in details, that products of link 
variables on closed loops are the observable quantities. 

We also remark that the inverse temperature (3 that appears in the Boltzmann 
distribution (21) is connected to the coupling constant of the continuum gauge 
theory one recovers in the limit of small lattice spacing: 

PstM ^ 9gt , T stM ^ g 2 GT ■ ( 24 ) 

The theory has a phase transition at T = (here g — > 0), where the correlation 
length diverges (exponentially in ^). As usual in this continuum limit the lattice 
structure becomes irrelevant. 

For high values of T it is easy to prove the relation we have depicted in (3.D): 
a Wilson loop (the product of oriented link variables over a closed loop) of large 
size R ■ M decays as the exponential of minus the string tension a times the area 
of the loop in the confined region. If quarks are confined and cannot be separated 
out from color singlet states (the physical particles, mesons and baryons) we have 
an area decay of large Wilson loops. So, the nice part is that, as we said, it is easy 
to prove that Lattice QCD is confined in the high T region (where the theory is 
very different from the continuum theory). The bad part is that one can prove 
that for all LGT, including the Lattice QED: since electrons are known to be 
free in the continuum theory, if everything goes right in this case something will 
have to happen. What happens for example in the case of QED is that a finite 
T phase transition separates two regions, the confined, unphysical one, and the 
deconfined physical theory. One has to show that the same does not happen in 
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Figure 4: Lines of zeroes of the imaginary part (thin curves) and of the real part 
(thick curves) of the ratio of partition functions. 

Lattice QCD, and that the theory does not undergo a phase transition that would 
destroy confinement. 

Our analysis of lattice zeroes was studying this problem. The technique is 
exactly the same we have described in the former section. In fig. (4) we draw 
curves with 




(25) 



and curves with 



(for the exact curves the reader should consult the original reference, [10]). It 
is clear from the figure one can determine the zero with good precision. Since 
one finds a zero quite far from the real (5 axis, and it does not approach the real 
axis for larger lattice sizes, one does not expect a real phase transition, but is 
measuring a transient phenomenon that is irrelevant as far as the real continuum 
limit is concerned. It is clear that the evidence we have discussed here is the same 
one exploits when using finite size scaling techniques, looking for example at the 
behavior of a peak of the specific heat. It is interesting that one can directly 
study the position of complex zeroes of the partition function. 
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2.3 Data Analysis 



In equation (12) we have already seen that our approach can be used to deduce 
information at (5 after running a simulation at (3. We can generalize (12) by 
noticing that we can sample also magnetizations m (so we can reconstruct all 
moments of m). After each measurement we write down the energy and the 
magnetization of the configuration (that we assume to be at thermal equilibrium). 
We have that 

(f3-f3)E+(h-h)m 

F E m (f3, h) = F E m 0, h) ~ - — - . (27) 

In [12] one finds a very nice picture showing how well the method can work for 
example for the case of the 2d Ising model. 

The method we have described here is very useful, for example, when one 
wants to measure the finite size scaling behavior of physical observable quantities. 
Let us consider for example the specific heat Cy. The maximum value Cy takes 
on a finite lattice of linear size L, C™ ax (L) scales at a critical point as 

qr x (£) — . (28) 

The main problem is in the fact that we only measure for a discrete set of values 
of the temperature T (by normal MC or by using a T-changing Monte Carlo 
procedure, see later in this notes). A priori we do not know at which value of T 
on a given lattice the specific heat takes its maximum value, and such T value 
depends on L: 

T max (L) | CV(T max (L), L) = C^ ax (L) depends on L . (29) 

It can be difficult to find the correct value of T max (L): it is a delicate fine tuning 
process that has to be repeated for each different L value. A wrong determination 
of T max (L) can generate a very misleading effect. Let us look at figure (5). The 
empty dots represent (gedanken) measurements of the specific heat taken with a 
linear size L, at the set of T values which appear in the figure. The filled dots 
are measurements on a lattice of larger linear size L', taken at the same T values. 
One would assume that the finite size scaling behavior is given by the scaling of 
the two measured points with higher value of Cy. But maybe the real maximum, 
on the larger lattice, is where the triangle is: there the scaling could be very 
different from the one we found on the points we measured, but sadly we did not 
measure on this point. We want to stress that this effect creates real practical 
problems in numerical simulations. 

The pattern of data analysis we have discussed here solves this important 
problem. Statistical error can be kept under control (by using for example jack- 
knife and binning techniques), and numerical studies of scaling become (without 
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Figure 5: The specific heat versus T for a typical finite size scaling study 
suffering of troubles that can be cured by reweighting. Empty dots are for the 
smaller lattice, filled dots for the larger lattice, and the triangle for the point we 
did not measure and we should have measured. 

further computational expenses, since we already had the information) a real 
quantitative tool. 

2.4 Multi-Hystogramming 

The idea of reweighting can be pushed forward. [13] introduced the multi- 
hystogramming. The crucial step is to realize that you can patch data from 
different simulations at different (3 k values to reconstruct all of the relevant (3 
region. 

So, we have to sum up histograms. The delicate point is how to sum them 
up, i.e. how to determine the weights to use in constructing linear combinations 
of the different entries. The way discussed in [13] is to determine the weights by 
minimizing the statistical uncertainty over the final estimate for Pp(E). Let us 
call N k (E) the data histogram for the k-ih Monte Carlo run, at /3 k . Let us define 
9 k = 1 + 2r k , where r k is the estimated correlation time at (3 k , and n k the number 
of sweeps used in the k-th Monte Carlo run. One finds that the parameters {g n } 
can be determined self-consistently by iteration from 

E* =1 N k (E) e^ E 9 k l 
mN Ef =1 n 3 eft.*-* 0? ' 
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We have denoted by R the number of Monte Carlo runs. The method works well, 
and at today it can be considered as a standard analysis tool. 

3 Improved Monte Carlo Methods 

We will discuss here about improving Monte Carlo methods. We will be mainly 
talking about tempering, by [30] (see section (3.1)), where you add a second 
Markov chain to the usual Metropolis chain: you let (3 vary, trying in this way 
to make easier for the system to move across deep free energy valleys separated 
by high free energy barriers. We will also try to discuss about general issues like 
thermalization, that are of crucial importance already when discussing simple 
Monte Carlo methods, and that turn out to be even more delicate issues here. 

Eventually one is looking for a very effective, very simple simulation method. 
Somehow when you start the numerical study of a statistical system you work at 
two different levels. At first you run a quick and not very clean Monte Carlo, to 
understand the first physics ideas 2 . Only after that you set up complex simula- 
tional procedures, data analysis, error determination. It would be nice if the first 
phase we have just describe could already be based on something more powerful 
that the usual Metropolis approach: I hope the reader will be convinced that 
maybe the Parallel Tempering approach (see section (3.3)) is the good candidate 
for this role. In Parallel Tempering there are no parameters to be tuned, no 
difficult choices to be done (but for the selection of a set of T values that can be 
done with some large freedom): it looks like the good thing. 

3.1 Simulated Tempering 

We will introduce here Simulated Tempering, [30], an improved Monte Carlo 
method that turns out for example to be very effective for simulating spin glasses 
(for further studies and applications of tempering see [11, 9, 47, 23, 7]). Later on 
we will discuss about Parallel Tempering, [43, 14, 19, 18], that turns out to be a 
better and simpler method (having at the same time these two advantages is a 
quite rare and appreciated feature). So we will discuss now some complex matter 
that we will eventually not need in the practical implementation of the method: 
we will do that since it helps in understanding some of the physical mechanisms 
that govern the scheme. These mechanisms are common to the most promising 
parallel tempering scheme. Parameter tuning is not needed in parallel tempering. 

2 The phrase typically used by G. Parisi to describe this approach is II buon giorno si vede 
dal mattino, that I would translate in english with It is already early in the morning that you 
can recognize a good day from a bad one. 
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Simulated tempering is a global optimization method: it can be seen as an 
annealing, generalized to T 7^ 0, with a built-in scheduling. This can be of large 
practical importance, since setting up the schedule is one of the most difficult 
and time taking parts of an annealing simulation. Tempering is very similar to 
annealing, but after the initial thermalization period the field configuration is 
always at Boltzmann equilibrium at one of the allowed f3 values. This phrase, 
that is a bit mysterious at this point, is important, and will be clarified in the 
following. 

There are many related techniques, strictly connected methods and needed in- 
troductive material. In first one needs to know about generalities of Monte Carlo 
methods (see for example the lectures in this book by [28]). Theory of multi- 
ple Markov chains is the mathematical basis one needs to clarify the theoretical 
aspects of the method ([43, 14, 19, 18]). 

Strictly related to tempering are the scaling approaches based on Umbrella 
Sampling, [44, 45, 15, 46]: we have already discussed the issue when illustrating 
data analysis reconstruction. It has to be noticed that many of the ideas we are 
applying now to numerical Statistical Mechanics and Euclidean Statistical Field 
Theory have been elaborated many years ago in the context of physics of liquids 
and of Molecular Dynamics. 

Multicanonical Approachesby [3, 4, 1, 2] are also strictly related to tempering, 
and we will also discuss about them in the following. Multicanonical methods are 
more ambitious and in many situations potentially more powerful than tempering 
(but they are a bit more complex): they can try and deal with first order phase 
transitions, that is not something you want to do with tempering (that works well 
for continuous phase transitions). We also note that different kind of tempering- 
like approaches have been proposed, for example in [24]. 

As we have discussed from the start of these notes Tempering has been built on 
reconstruction methods, [10, 29, 12, 13]: if we can use data at (3 for learning about 
expectation values at j3 maybe we can also use this information for speeding up 
the dynamics itself (again, see [28] for an introduction to Monte Carlo simulations 
and Markov chains). When you run a long simulation you could know much more 
than you believe (or much less, but we will discuss about that when talking about 
thermalization and correlation times). 

Let us start with describing Simulated Tempering. We want to equilibrate our 
statistical system with respect to the Boltzmann distribution 

P({a}) ~ e -^(W) . ( 31 ) 
We choose a new probability distribution, with an enlarged number of variables: 

nwhm), (32) 

such that, for each given choice of the {X}, P is Boltzmann with some given 
choice of (3. We will allow f3 to become a dynamical variable: 
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W - (W,{&}), (33) 

a — 1, . . . , A. We have allowed A fixed values for the f3 a variables. 

The mechanism we are introducing is very simple: the new equilibrium prob- 
ability distribution is 

Pe q ( W, {Pa}) * e-«W) +9Q > (34) 

where if is the original Hamiltonian of the problem. The g a are constant quan- 
tities, whose meaning will be elucidated in the following. They have to be fine 
tuned before running the equilibrium sampling, and they only depend on the 
value of a (only one g a is allowed for each /3 a value). 

The probability of finding a given value of a (i.e. the probability for a given 
P a value to occur during the run) is: 

p a = Za e^ = e 9<*-P«f<* ? (35) 
where z a is the partition function at fixed f3 a , 

J{da}e-^ H ^ , (36) 

and f a is the free energy of the system with fixed P a . (35) shows that the free 
parameters g a are connected to the free energy of the system. For making the 
probability of different P values to occur to be equal (i.e. the system to visit with 
the same frequency all allowed P values) we need to set 

9a = Pafa ■ (37) 

Since we do not know th f a that will amount to use for the g's the best available 
estimate for /. We will discuss in the following how to produce a good guess 
for the g a , that can also be improved systematically. Since eventually parallel 
tempering will not need any kind of parameters to be fine tuned we will not 
go in details about this issue (explaining what the g are in simple tempering is 
useful for reaching a better physical understanding of both tempering and parallel 
tempering). 

If we are mainly interested in studying the system at T = T we will allow for a 
set of T a , for example at constant distance, T = T, T\ = (T + 5), T 2 = (T + 25), 
... (see figure (6)). We will discuss how to select optimal values for 5 (this 
has to be done also in parallel tempering, but one does not need fine tuning). 
After the runs if you want you can use reconstruction schemes to use all of the 
information contained in all samples, but in this kind of approach this is typically 
not necessary (since we are looking for thermalized configurations in a complex, 
cold broken phase, and the main information is typically contained in the T 
data). 

The main physical ideas are 
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Figure 6: The choice of the T values to be used in the tempered simulation of 
a complex system. 

1. The system is frozen at low T. High energy barriers separate different free 
energy valleys. 

2. When the system warms up during tempering the free energy barriers get 
smaller. They disappear when the system crosses T c . 

3. When it cools down again it will probably explore a different local minimum. 

The method seems to work nicely for second order like phase transitions: for 
that to happen you need the broken state to be conformationally similar to the 
high T one. 

We will now analyze in detail a full updating sweep, in order to make the 
procedure clear. 

1) Sweep the full lattice, maybe s times (s > 1 could help) and run a full 
normal Monte Carlo (Metropolis, or cluster, or over-relaxed or what you like 
better) update for the {a} variables at fixed f3 a . 

2) Propose the update 



not to move with probability ^, since we are not interested in detailed balance in 
/3 space: see [28] about the importance of rejected moves). 

We accept the update with normal Metropolis weight. Here the factor in 
the exponential only changes because of the change in (3, since the configuration 
energy does not change: 




AS 



tempering 



((3 a , - P a )H{{a}) + (g a , - g a ) , 



(39) 
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where both the H and the (g a > — g a ) terms are constant. If AS <= we accept 
the P update, if AS > we accept it if a random number uniformly distributed 
in (0, 1) is smaller than e~ As . 

A good first guess for the g a can be deduced from (39) (then, as we said, the 
g's can be systematically improved). We can take the g such that in average AS 
is balanced: 

{9a> - 9a) = -{Pa' ~ Pa) ^ ~ , (40) 

that holds at first order in 5/3. We see that the g, by balancing for the free energy 
of the system, do not allow the system to follow in the lowest energy state and 
stay there forever. The first correction is of the form 

C%{5(3) 2 , (41) 

and keeping this term of order one implies that in the large volume limit 5(3 has 

_ i 

to be kept of order C v 2 , see also later. 

Now the crucial observation: at each /3 a value (after the initial thermaliza- 
tion) the system is always in equilibrium with respect to the usual Boltzmann 
distribution. The system is all the time (even at the moment of /^-transitions) at 
thermal equilibrium. 

The analysis of observable quantities is very easy. One just has to select all 
configurations which were flagged by a given f3 value: 

^All configurations &tp a ^conf , AO , 

® a Number of configurations at/3 Q 

How do we select a good set of {(3 a } values? Fig. (7) can help to give an 
idea. Let the first P(E) on the left be the one at the lowest T value (the one 
we are interested in thermalizing) . The center the one is at T + 5, i.e. at the 
second T a value. We request that the overlap of the two probability distributions 
is non-negligible. A configuration with an energy included in the colored region 
of the picture is typically a good configuration both at T and at T + 5 (it is here 
that problems connected to discontinuities, like in first order phase transitions 
make the method fail). So, 5 = T a+ i — T a has to be selected such that there 
is a non-zero overlap among the two energy probability distributions (as usual 
in Metropolis like methods that is connected to having a reasonable acceptance 
factor, of order 0.5, for the /3 a moves). That should also make more clear why 
we could like to do more than one {a} sweeps before updating /3: we want to 
avoid a series of jumps between adjacent (3 a values, and give to the system the 
possibility of moving at fixed (3 at the opposite extremum of the P(E) before 
trying changing f3 again. 

We repeat which are the basis of the physical mechanism we are interested 
in. We are thinking about systems with very high free energy barriers. Warming 
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Figure 7: P(E) for different values of T. 



the system up lowers the barriers, which disappear at T c . When cooling down 
(always staying at thermal equilibrium!) the system can fall in a completely new 
valley. 

Experimentally the method turns out to work very well for 3d Edwards- 
Anderson spin glasses. It is interesting to note that temperature chaos, [26, 
27, 38], should give troubles, but on the lattice sizes we have been able to study 
we did not get them: it is interesting to try and understand more about this is- 
sue. In the case of 3d Edwards- Anderson spin glasses by using parallel tempering 
(see (3.3)) we have been able to thermalize a 16 3 system down to 0.7T C ([32]). 
On a 24 3 lattice one is able to thermalize with a reasonable computer time (we 
are talking about single runs on one disorder sample taking order of one day of 
workstation) down to 0.9T C , while for lower T the situation gets more difficult to 
control. 

The method does not work, [21], for generic heteropolymers (with order 30 
sites, Lennard- Jones potential with quenched random couplings [20]), and this 
is probably connected to the fact that the glassy transition is in this case of 
a discontinuous nature: for true (small) proteins, on the contrary, the method 
seems to be helpful [16]. Also the method has been applied among others ([11, 7]) 
by [9] to the Ad disordered Heisemberg model, by [23] to the 2d EA spin glass 
(by noticing one of the important advantages of the method, i.e. the trivial 
vectorization and parallelization of the scheme), and by [47] to CP N ^ models. 
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Figure 8: The measured specific heat Cy as a function of (3. (3 = 0.26, in the 
cold region, is the point on which we focus. 

3.2 Random Field Ising Model 

The case study we have done in our first tempering paper, [30], was discussing 
the Random Field Ising Model, and it is ideal to illustrate in some more detail 
the main feature of the method: we will discuss it here. The model is defined by 
the Hamiltonian 

H = - + h i a i > ( 43 ) 

where o~i = ±1, the sum with two indices runs over first neighboring sites on a 
simple cubic lattice in d dimensions, and the local external fields hi are quenched 
random variables, which take the value ±|/i| with equal probability. 

We have taken d = 3, V = 10 3 sites and \h\ = 1, sitting close to the critical 
region, on the cold side (we will focus here on studying f3 = .26). Fig. (8) gives an 
idea about the critical region. The specific heat has a maximum close to P — .25. 
In these first runs in most cases we allowed the system to visit only 3 (3 values, 
(Pa = (-24, .25, .26)), and sometimes 5 P values: for the Edwards Anderson model 
in the most recent runs of [32] on a 24 3 lattice we use up to 50 P values. The 3 
P values we have given above are selected such to span from the cold phase to 
the warm phase. This is the general principle: the system has to be allowed to 
travel, in P space, from the cold phase, that has a physical interest, to the warm 
phase, where free energy barriers disappear and correlation times are short. 

In fig. (9) we plot the P a values the system selects in the course of the 
dynamics. Notice that the system is not getting trapped in the cold or in the 
warm phase: acceptance of the P swap is easy, and the system easily travels 
among the two phases. 

The magnetization measured from a typical normal, fixed P Monte Carlo 
metropolis runs is shown in fig. (10): here the correlation time, since one never 
sees a flip to the opposite magnetization time, seems short. That taking fig. 
(10) at face values would be misleading is clear from the magnetizations from a 
tempering run shown in fig. (11): here tempering is allowing the system to flip 
among the ±m states, and it is clear that configurations with positive m have a 
relevant weight in the equilibrium distribution. 
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Figure 9: (3 a as a function of the Monte Carlo tempering time. 
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Figure 10: Magnetization as a function of Monte Carlo time, at (3 — 0.26, for 
normal Metropolis algorithm. The system never tunnels to a positive magneti- 
zation value. 
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Figure 11: Magnetization versus Monte Carlo time for tempering. Here all (3 
values are included: the measured magnetization are related to configurations 
that are at Boltzmann thermal equilibrium with different temperature values. 
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Figure 12: As in fig. (11), but we have selected only configurations with (5 = 
0.26. 

In fig. (12) we plot the magnetization of the configurations that got a (3 
value of 0.26. Here one clearly sees that the state with positive m is allowed at 
(3 = 0.26 with a probability smaller than for the negative m state but definitely 
non-zero. These last figures give the main idea. In a non-disordered Ising model 
flipping from the minus state to the plus state would be irrelevant, if we are not 
interested in studying details of the tunneling dynamics: we know by symmetry 
that to the minus state corresponds an identical plus state. But this is not true for 
a disordered model, where on the contrary the tunnel among degenerate ground 
states that are not related by a trivial symmetry is the most interesting part of 
the dynamics. Here the random quenched magnetic field breaks the symmetry, 
and we want to explore the different ground state. Tempering allows us to do 
that. 

This was a very easy numerical experiments, but the steps we have described 
characterize also a more complex settings: for example we already told that 
the numerical simulations we are running now ([32]) on large lattices for the 3d 
Edwards Anderson spin glass are quite complex (but they work very well). 

Tempering is related to annealing. A trivial extension of annealing to non-zero 
values of T is impossible: annealing gives only information about energy, while 
for T > we want to deal with the free energy. We do not get from annealing 
any information about the entropic structure of the phase space. Tempering can 
be seen as such a generalization. 

Tempering could also turn out to be an effective global optimization scheme, 
even if this issue has non been looking in much detail until now. Very preliminary 
unpublished studies are by [39, 42]. The most important point is that tempering 
contains a built-in, self-implemented schedule. Setting up the schedule is the 
most serious problem with annealing, and tempering does it for us. 
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3.3 Parallel Tempering 

Parallel Tempering has been discussed by [43, 14, 19, 18]. We will describe it 
here. As we already said it is so simple that makes many of the details we have 
discussed with tempering only informational (we look at that as a plus). Also, the 
parallel tempering performs dramatically well, for example, on finite dimensional 
Edwards Anderson models. 

In figures (6) and (7) we have shown how we select the T values we use during 
the simulation. Let us say that by using the criteria we have defined before we 
need for example values of f3 a . Now in the parallel tempering approach, you 
run N<p\ simulations in parallel ( Nm\ different configurations C a of the system 
that evolve in the same quenched disordered landscape). Each copy starts with 
assigned a different (3 values. For example start with 

P(C ) = A,; /3(Ci) = pi, ... P(C a ) =&;... P(C N{f)) ) = Nw ■ (44) 
Now the composite Monte Carlo method is based on two series of steps: 

1. Update the copies of the system with an usual Metropolis sweep of 

Wc a • • • W % • (45) 

2. Swap the (3 values. 

• Propose the first /3-swap: the configuration that is at (3q can go to f3\ 
and that one that is at j3\ can go at Each spin configuration carries 
a (5- label. Configurations carrying adjacent j3- labels try to swap their 
labels. You use Metropolis for swapping (5 with the correct probability 
(see later). 

• Propose the second /9-swap: the configuration that is at f3\ can go to 
Pi and that one that is at Pi can go at f3\. 

• And so on, to the last couple of configurations: the configuration that 
is at Pn {/3) can go to /3jv (/3) -i and that one that is at /3jv (/3) -i can go at 

pN (p) ■ 

I stress the fact that you always try to swap adjacent /5-values (or the proce- 
dure would be not effective). At a given point in time configuration number 27 
can be at (3 and configuration number 3 at these are the two configurations 
you will try to swap. 

The /^-Metropolis swap: as usual, you will accept if the swap makes the 
energy decreasing, and will accept with probability e~ AS if it makes the energy 
increasing. You will have to compute 
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AS = S'-S (46) 

= ([3 a+1 E CfJa + (3 a E Cf3a+i ) - ((3 a E c ^ + (3 a+1 E Cf3a+i ) , 

where obviously E C/3a and E C(j +l do not change during the /3-swap. 

Here there is no freedom (and no need) for an additive term like the one we 
had before, g a . In this method once fixed the f3 values (the fixing of the (3 can 
be done loosely, and does not need fine tuning) there are no free parameters, and 
no fine tuning needed. The ensemble of the parallel tempering is very different 
from the one of tempering, even if the two methods look very similar. 

A possible way to look at the fact that we do not need the g parameters is 
the following. The g a were needed in order to prevent the system from collapsing 
to the state with lowest energy. But here there is a fixed number of /9-values. 
A given value of f3 a , for example /5 23 , cannot disappear. This fact makes things 
easy for us. 

3.4 The Thermalization 

Thermalization is a crucial issue. This is already true for usual spin models and 
usual local dynamics. We want to be sure that we are looking at a system that 
has reached equilibrium, and after that we want to be sure we have correlation 
times under control. In other words we are interested in studying the asymptotic 
equilibrium probability distribution, and we need to be sure that we are collecting 
the right information. In complex dynamics like tempering, involving multiple 
Markov chains, it is even more difficult to keep correlation times under control. 
The fact that we are typically studying complex systems, where slow dynamics is 
one of the most prominent feature, does not help. I will discuss here some details 
of the understanding we have reached about this issue. We will try to understand 
which kind of thermalization criteria we can adopt when using tempering. 

Let us start by describing what happens when we use a simple Metropolis 
dynamics for studying a system with quenched disorder. Here we will not give 
details, that can be found for example in [34], but I will only remind the reader 
that in this case one mainly deals with functions of the overlap 

9** = , (47) 

i 

where a and (3 label two replica's of the systems, i.e. two equilibrium configura- 
tions defined under the same realization of the quenched disorder. The overlap 
gives us information about how similar are two typical equilibrium configurations 
of the same system, q plays the role of the order parameter, and it is the equiv- 
alent of the magnetization m for usual spin models. The typical shape of the 
probability distribution of q, averaged over the quenched disorder J (see [34]), is 
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Figure 13: Pj{q) versus q. The flat part close to q = is the most remarkable 
feature of the behavior of a class of disordered systems. 

shown in fig. (13). The two peaks would be sharp (at ±m 2 ) for an usual spin 
model. The non-trivial part here is the fact that there is a continuous, non-zero 
contribution close to q ~ 0: the system can exist at equilibrium in many states, 
that can even be, in the infinite volume limit, completely different. The non- 
trivial part here is the fact that there is a continuous, non-zero contribution close 
to q ~ 0: the system can exist at equilibrium in many states, that can even be, 
in the infinite volume limit, completely different. 

In the case of a normal Metropolis update one can use two very strong criteria 
to check thermalization. 

1. Check symmetry of P(q) under q <-> —q for each sample. This is a very 
strong criterion. The main issue is here that the full flip of the whole 
system, {a} — > {— a} is the slowest mode of the dynamics. If you have done 
that well enough to get a good ±<x symmetry you have explored the whole 
phase space. In the simulation of a normal spin-model in the cold phase 
we would never be so demanding (if not interested to details of tunneling 
amplitudes): we would just seat in one vacuum and compute observable 
quantities there. As we already said, this is the main difficulty connected 
to the study of disordered systems. 

In fig. (14) we give two typical P(q) for two different given quenched 
realizations of the disorder. In the Parisi solution of the mean field (see 
[34]) you can detailed compute many properties: for example how many 
configurations contribute to the q ~ plateau. The numerical results in 3d 
show a very good similarity with the mean field results ([31, 32]). 
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This criterion can be adopted for tempering, but in this case it is not 
so strong anymore: in tempering methods the mode {a} — > {— er} is not 
necessarily a very slow mode. It is possible in this case that one gets 
a very symmetric P(q) that is not related to the asymptotic equilibrium 
distribution function. 

2. A second criterion, originally due to [6], that is very strong for usual dy- 
namics, is based on using two different definitions of the P(q). 

(a) After a random start from two different spin configurations simulate 
two independent copies of the system (in the same set of {J} cou- 
plings). This is the definition we had in mind till now. If we denote 
by a and r the two copies of the system we call q 2 

q2 = W(am), (48) 

i 

and Pi{q) the equilibrium probability distribution of q 2 . 

(b) In the second approach we use a dynamical measurement, at different 
times. We wait for a large Monte Carlo time separation t and we define 
(for t large enough) 

<ldyn = ^ E fa(*o)<Ti(*o + *)) , (49) 

i 

where eventually we will take an average over t - We define Pd yn {q) 
the equilibrium probability distribution of qd yn - 

When using the first definition at short times the two copies are not as similar 
as they will be asymptotically. Correlations only build up slowly. So P 2 (q) — * 
P{q) from below, and at short times P 2 (q) is centered at lower q values than 
P(q)- On the contrary at short times a(to) and a(t + 1) are correlated, i.e. qd yn 
at short times tends to be larger than the asymptotic value. So Pd yn (q) — > P(q) 
from above. Only for times large enough Pd yn (q) = ^2(9), and we use this 
condition to check thermalization. 

Also the extension of this second criterion to tempering is not straightforward, 
since now dynamics is assigning to different configurations different (3 values. 

For parallel tempering runs, for example, we use three main conditions to 
check thermalization. 

1. We check that all observables are not drifting in time. We look for example 
at the energy E and at q 2 . We check, on logarithmic time scales, that they 
have safely converged to a stable value. We also explicitly look at the full 
Pj(q), and check it has no drift. 
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Figure 14: Two typical Pj(q) versus q, for two different realizations of the 
coupling. 

2. We check that the spacing of the allowed f3 a is small enough to guarantee 
a good acceptance ratio for the proposed (5 swaps. 

3. We demand that each of the configurations C a must have visited each 
of the allowed f3 a values with similar frequency. If we have done ten millions 
sweeps and we have ten allowed (5 values we demand that each configura- 
tions has spend more or less one million sweeps in each (3 value. We keep 
a set of counters to check that. We want to be able to detect situations 
where some systems are confined in a part of the phase space, and a good 
acceptance factor is not enough to avoid that (the configurations could just 
be flipping locally in (3 space). 

3.5 More Comments 

It is interesting to note some other technical comments about tempering like 
methods, mainly thinking about the large volume scaling behavior and about the 
performances of the method. Here we go, with a slightly miscellaneous series of 
comments. 

When you increase the lattice size you need a larger number of allowed f3 a 
values, i.e. a larger N^p), to sample the /3-space, and to reach the region where 
free energy barriers disappear. The method is critically slowed down, but only 
like a power law. 

Again, disordered systems need more attention than normal systems. You 
have to check that for each realization of the quenched disorder {J} the condition 
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that all C a have visited all (3 a values for similar periods of time is satisfied. From 
our experience the situation seems quite sharp: when the method works it works 
well, when it does not work it is a disaster (i.e. some visiting times are of order 
zero and some are of the order of the total time). For normal tempering (but 
parallel tempering seems preferable in all known cases) the constants {g a } have 
to be tuned separately in each sample. 

Let us discuss in better detail about volume scaling in tempering like methods. 
Let select the allowed f3 values at a constant distance 5: 

P a+1 =p a + S . (50) 
The probability for accepting a /3-swap is 

PswMPcPa+i) =e- A , (51) 

and 

- hg(PswAp) = A = 5 ■ (S(C Pa+1 ) - S(C P J) ~ 5^ . (52) 
So if the specific heat is not diverging we want to select 

S 2 N = constant , 5 ~ , i.e. Np ~ . (53) 

At a second order phase transition point, where the specific heat diverges, we 
have to be slightly more careful, since the number of intervals we need turns out 
to be higher. One has 
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f ~ \T-T C \~ V , C v ~ \T-T C 

\T-T C \ ~ ~ , 

C v ~N^. (54) 

So we get 



S 2 N 1+ ^ = constant , N p ~ 5' 1 ~ N& 1+ & , (55) 

that is our final estimate. 

The choice of the set {f3 a } is not crucial (not like the g a 's in the serial tem- 
pering). We can select the same set for all the realizations of the disorder, paying 
only the price of loosing some small amount of efficiency 

In fig. (15) we sketch (the picture only has a pictorial role) the correlation 
times computed by [18]. Filled dots are for tempering, empty dots for multi- 
canonical. Simulations are for a 3d Edwards Anderson spin glass, with couplings 
J = ±1. 32/5 values have been allowed in the parallel tempering run, for all 
N values, r is defined as the typical time needed from a spin configuration for 
going from the cold to the warm phase. It is worth noticing that in the parallel 
tempering, given the f3 set, there are no parameters to be fixed. 
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Figure 15: Correlation times for multicanonical and for tempering, for different 
lattice sizes (in log- log scale). The figure is only sketchy, and the interested reader 
can find the exact data point in fig. 2 the original paper by [18]. Filled dots are 
for tempering, empty dots for multicanonical. 

3.6 Umbrella Sampling and Reweighting 

All the idea we have discussed up to now are related to the technique of Umbrella 
Sampling by [44, 45] (mainly developed for simulations of liquid systems). Let 
us show now how we can relate that to reweighting techniques. 
We consider an observable quantity A 

_ J{dC}e-^A({C}) 
{ " J{dC}e-^ c }) 

= J{dC}n p ({C})A({C}), (56) 
where 7Tg is the Boltzmann equilibrium distribution at f3, 

e -PS({C}) 

= — = • (57) 

So, now, we want to improve things: maybe dynamics at (5 is slow, or we have 
already data at $ and we want an added bonus, or maybe we want to measure 
something more fancy (see earlier in the text). So we write 
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J{dC}e-^A({C})l ( j{dC}n \ 
{ )p f{dC}e-MicM \I{dC}f) ' { } 

where we have inserted a large number of ones. We find in this way that 

/ Ae-f> s \ 

(A)p = , (59) 

where the expectation values are taken now over the n probability distribution. 
If we choose 

-ps 

Tf = e — (60) 

we find the usual reweighting, as we have already discussed. But now we can 
also use the most general umbrella sampling by [44, 45]. The relation we have 
just derived, (60), is value for a generic probability distribution jr. tt does not 
need to have the form of a Boltzmann distribution at some value of /3, it can be 
everything you like. This is umbrella sampling: open 7r — > jr, as an umbrella, to 
cover all of the parameter space in the region of interest. For example you can 
take 

H{C}) = ^(P a )e-^S({C}). (61) 

a 

Selecting the u such that you get an equal sampling of the j3 points gives u ~ , 
i.e. the choice of tempering. 



3.7 Multicanonical Methods 

A large amount of work has been done on the so called multicanonical methods, [3, 
4, 1, 2], that are very powerful in order to study discontinuous phase transitions. 
We discuss here a simple example, [4], where the 2d, 10 states Potts model is 
analyzed. The results we describe, by [4], are for lattices up to 100 2 . The action 
has the form 



S=Y, S w Si = 0,l,...,9. (62) 

(iJ) 

Such a model undergoes a temperature driven strong first order phase transition. 
A hard problem is to compute the interfacial free energy between the disordered 
and the ten ordered states. 

On a finite lattice there are no phase transitions: we define /3£, the pseud- 
ocritical coupling on a lattice of linear size L such that the two peaks in the 
probability distribution of the internal energy have the same height. In fig. (16) 
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Figure 16: The probability distribution of the energy for different lattice sizes. 



(from [4]) the probability for different distribution of the energy lattice sizes: on 
larger lattices the probability of getting a configuration in the forbidden region 
becomes smaller and smaller. One has that 

When using a normal Metropolis dynamics eq. (63) implies that the tunneling 
time diverges severely with the lattice size: 

TMetropoUs ^ AL a e° L ^ . (64) 

Multicanonical method takes a different approach, by modifying the sampling 
probability distribution. Here one samples phase space with weight 

pMCan _ e a\-HlS ^ fagk < g < (55) 

(instead of usual P^ olt ~ c^ i3lS ). One has partitioned the action range in intervals 
Ik, using a different action in each range. Now one chooses intervals Ik and 
parameters a\, b\ such that pj- MCan ^ \ s fl a t : fig. (17) shows that it can be done 
very successfully. 

Configurations that were exponentially suppressed are now enhanced by the 
Multicanonical action: the interested reader can look at details about fixing the 
parameters in the original papers, [3, 4, 1, 2]. 

In the case of the Multicanonical algorithm after data collecting you need 
reweighting (as opposed to tempering, where the output data at each f3 value 
are directed distributed according to Boltzmann) to reconstruct the Boltzmann 
distribution. At f3 c L 

pBolt = e ms-al-al pMCan _ ^ 

The improvement is dramatic, and an exponential slowing down becomes power 
like ([4]: the same happens for tempering, for example in the cold phase plus 
to minus tunneling in the Ising model, [11]). An estimate of [4] gives r^ Can ~ 
0.7L 2 - 7 , versus T ^ eatBath ~ 1.5L 2 - 15 L a08L . 
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Figure 17: The Boltzmann and the Multicanonical probability distribution of 
the energy for L = 70. The multicanonical distribution is quite flat, allowing 
easy transitions among the two sides of the first order phase transition. 
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